Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PushCSC for SparsePage. #4193

Merged
merged 3 commits into from
Mar 1, 2019
Merged

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Feb 28, 2019

  • Move Push* definitions into cc file.
  • Add std:: prefix to size_t to make clang++ happy.
  • Address monitor count == 0.

close #4037.
close #4130.

* Move Push* definitions into cc file.
* Add std:: prefix to `size_t` make clang++ happy.
* Address monitor count == 0.
@trivialfis
Copy link
Member Author

@hcho3 I used your testing case but added a SLOW prefix to it.

@trivialfis
Copy link
Member Author

@hcho3 It seems #3622 shows up on Travis now. Could you help taking a look?

@trivialfis
Copy link
Member Author

@hcho3 Sorry, not the same. Will look into this myself.

@hcho3
Copy link
Collaborator

hcho3 commented Feb 28, 2019

@trivialfis Wow, thanks so much for the effort! I'll go ahead and review it today.

cc @CodingCat

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PushCSC() function looks good to me. About the test cases, I wonder whether we should actually allow duplicate entries. If we have two entries with the same (row, column) coordinates that have different values, which value "wins"?

@trivialfis
Copy link
Member Author

@hcho3

If we have two entries with the same (row, column) coordinates that have different values, which value "wins"

I'm not sure what you mean. But as we talked before, it's worth thinking about some architectural changes to DMatrix itself. I will try to make a list for what are expected for future DMatrix after this release.

@hcho3
Copy link
Collaborator

hcho3 commented Mar 1, 2019

@trivialfis See this vector:

std::vector<xgboost::Entry> vec = { {0, 1.0f}, {0, 2.0f}, {2, -0.2f} };

where the value of feature 0 is undefined (is it 1 or 2)?

I suppose this is a limitation of the CSR matrix format. There is an implicit trust that you won't have two feature values with the same feature index.

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this for now. We can discuss DMatrix format after 0.82 release.

@trivialfis trivialfis merged commit 7ea5675 into dmlc:master Mar 1, 2019
@trivialfis trivialfis deleted the fix/external-memory branch March 1, 2019 17:58
@lock lock bot locked as resolved and limited conversation to collaborators May 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants